[BUGFIX] Fix misleading result format docs for ExpectColumnValuesToBeOfType (#11076)#11880
Conversation
…OfType (fivetran#11076) The docstring Code Examples for ExpectColumnValuesToBeOfType showed the full Column Map result format (element_count, unexpected_count, partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype is 'object' (row-level inspection). For all other backends — SQL (including Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only {"observed_value": "<type_name>"}. Users relying on the documented format for Databricks or Spark were silently getting a different structure and had no way to know which format to expect. Changes: - Replaced the misleading Code Examples in the class docstring with a clear "Result Format" section that documents both shapes and explains when each applies. - Added a unit test that asserts 'observed_value' is present (and 'element_count' is absent) when running against a Pandas non-object column, preventing future regressions where the aggregate path accidentally switches to the map format (or vice-versa). Fixes fivetran#11076
👷 Deploy request for niobium-lead-7998 pending review.Visit the deploys page to approve it
|
|
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GX in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution: Individual Contributor License Agreement v1.0 Once you have signed the CLA, you can add a comment with the text Please reach out to the #gx-community-support channel, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error. Users missing a CLA: creazyfrog |
for more information, see https://pre-commit.ci
|
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GX in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution: Individual Contributor License Agreement v1.0 Once you have signed the CLA, you can add a comment with the text Please reach out to the #gx-community-support channel, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error. Users missing a CLA: creazyfrog |
|
Is this PR still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions 🙇 |
Summary
Fixes #11076
The docstring Code Examples for
ExpectColumnValuesToBeOfTypeshowed the full Column Map result format (element_count,unexpected_count,partial_unexpected_list, etc.) for all backends. In practice this format is only returned when Pandas is used with a column whose dtype isobject(row-level type inspection). For all other backends — SQL (Databricks, Snowflake, SQL Server, PostgreSQL, Trino), Spark, and Pandas with non-object dtypes — the expectation validates the column's schema-level data type and returns only{"observed_value": "<type_name>"}, making the documented examples actively misleading.Users on Databricks or Spark opened issue #11076 because they expected the full map format based on the docs.
Root Cause
_validate_pandas(non-object path),_validate_sqlalchemy, and_validate_sparkperform a schema-level aggregate check — there are no "unexpected rows" to enumerate, so the full Column Map output (element_count,unexpected_count,partial_unexpected_list, etc.) is fundamentally unavailable. The only meaningful result field isobserved_value(the actual column type). The Code Examples in the docstring were copied from a different context (the Pandas row-level map path) without being adjusted for the aggregate paths.Changes
great_expectations/expectations/core/expect_column_values_to_be_of_type.py{"observed_value": "<type>"}element_count,unexpected_count, etc.)observed_valueformat, which is what the vast majority of users actually see.tests/expectations/core/test_expect_column_values_to_be_of_type.pytest_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas— a unit test that assertsobserved_valueis present andelement_countis absent for a Pandas non-object column, preventing a future regression where the aggregate path accidentally returns the map format.Test plan
test_expect_column_values_to_be_of_type_result_contains_observed_value_for_pandas(new unit test)resultcontains onlyobserved_valuefor all aggregate-mode paths